{
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {},
  "cells": [
    {
      "metadata": {},
      "source": [
        "<td>\n",
        "   <a target=\"_blank\" href=\"https://labelbox.com\" ><img src=\"https://labelbox.com/blog/content/images/2021/02/logo-v4.svg\" width=190/></a>\n",
        "</td>"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "<td>\n",
        "<a href=\"https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/basics/batches.ipynb\" target=\"_blank\"><img\n",
        "src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
        "</td>\n",
        "\n",
        "<td>\n",
        "<a href=\"https://github.com/Labelbox/labelbox-python/blob/master/examples/basics/batches.ipynb\" target=\"_blank\"><img\n",
        "src=\"https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white\" alt=\"GitHub\"></a>\n",
        "</td>"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Batches\n",
        "https://docs.labelbox.com/docs/batches"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "* A batch is collection of data rows.\n",
        "* A data row cannot be part of more than one batch in a given project.\n",
        "* Batches work for all data types, but there can only be one data type per project.\n",
        "* Batches can not be shared between projects.\n",
        "* Batches may have data rows from multiple datasets.\n",
        "* Currently, only benchmarks quality settings is supported in batch projects\n",
        "* You can set the priority for each batch."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "!pip install \"labelbox[data]\" -q"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "import labelbox as lb\n",
        "import random\n",
        "import uuid\n",
        "import json"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "## API key and client\n",
        "Provide a valid API key below in order to properly connect to the Labelbox Client."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Add your API key\n",
        "API_KEY = \"\"\n",
        "client = lb.Client(api_key=API_KEY)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "## Create a dataset and data rows"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Create a dataset\n",
        "dataset = client.create_dataset(name=\"Demo-Batches-Colab\")\n",
        "\n",
        "uploads = []\n",
        "# Generate data rows\n",
        "for i in range(1,9):\n",
        "    uploads.append({\n",
        "        'row_data':  f\"https://storage.googleapis.com/labelbox-datasets/People_Clothing_Segmentation/jpeg_images/IMAGES/img_000{i}.jpeg\",\n",
        "        \"global_key\": \"TEST-ID-%id\" % uuid.uuid1(),\n",
        "    })\n",
        "\n",
        "data_rows = dataset.create_data_rows(uploads)\n",
        "data_rows.wait_till_done()\n",
        "print(\"ERRORS: \" , data_rows.errors)\n",
        "print(\"RESULT URL: \", data_rows.result_url)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "## Setup batch project"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "project = client.create_project(\n",
        "  name=\"Demo-Batches-Project\",                                 \n",
        "  media_type=lb.MediaType.Image\n",
        ")\n",
        "print(\"Project Name: \", project.name, \"Project ID: \", project.uid)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "## Create batches"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "### Select all data rows from the dataset\n"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "client.enable_experimental = True\n",
        "\n",
        "export_task = dataset.export()\n",
        "export_task.wait_till_done()\n",
        "\n",
        "data_rows = []\n",
        "\n",
        "def json_stream_handler(output: lb.JsonConverterOutput):\n",
        "  data_row = json.loads(output.json_str)\n",
        "  data_rows.append(data_row)\n",
        "\n",
        "\n",
        "if export_task.has_errors():\n",
        "  export_task.get_stream(\n",
        "  \n",
        "  converter=lb.JsonConverter(),\n",
        "  stream_type=lb.StreamType.ERRORS\n",
        "  ).start(stream_handler=lambda error: print(error))\n",
        "\n",
        "if export_task.has_result():\n",
        "  export_json = export_task.get_stream(\n",
        "    converter=lb.JsonConverter(),\n",
        "    stream_type=lb.StreamType.RESULT\n",
        "  ).start(stream_handler=json_stream_handler)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "global_keys = [data_row[\"data_row\"][\"global_key\"] for data_row in data_rows]\n",
        "print(\"Number of global keys:\", len(global_keys))"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Select a random sample\n",
        "This method is useful if you have large datasets and only want to work with a handful of data rows"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "sample = random.sample(global_keys, 4)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Create a batch\n",
        "This method takes in a list of either data row IDs or `DataRow` objects into a `data_rows` argument or global keys into a `global_keys` argument, but both approaches cannot be used in the same method."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "batch = project.create_batch(\n",
        "  name=\"Demo-First-Batch\", # Each batch in a project must have a unique name\n",
        "  global_keys=sample, # A list of data rows or data row ids\n",
        "  priority=5 # priority between 1(Highest) - 5(lowest)\n",
        ")\n",
        "# number of data rows in the batch\n",
        "print(\"Number of data rows in batch: \", batch.size)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Create multiple batches\n",
        "The `project.create_batches()` method accepts up to 1 million data rows.  Batches are chunked into groups of 100k if necessary, which is the maximum batch size. This method takes in a list of either data row IDs or `DataRow` objects into a `data_rows` argument or global keys into a `global_keys` argument, but both approaches cannot be used in the same method.\n",
        "\n",
        "This method takes in a list of either data row IDs or `DataRow` objects into a `data_rows` argument or global keys into a `global_keys` argument, but both approaches cannot be used in the same method. Batches will be created with the specified `name_prefix` argument and a unique suffix to ensure unique batch names. The suffix will be a 4-digit number starting at `0000`.\n",
        "\n",
        "For example, if the name prefix is `demo-create-batches-` and three batches are created, the names will be `demo-create-batches-0000`, `demo-create-batches-0001`, and `demo-create-batches-0002`. This method will throw an error if a batch with the same name already exists.\n",
        "\n",
        "In the code below, only one batch will be created, since we are only using the few data rows we created above. Creating over 100k data rows for this demonstration is not sensible, but this method is the preferred approach for batch creation as it will gracefully handle massive sets of data rows."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# First, we must create a second project so that we can re-use the data rows we already created.\n",
        "second_project = client.create_project(\n",
        "  name=\"Second-Demo-Batches-Project\",                                 \n",
        "  media_type=lb.MediaType.Image\n",
        ")\n",
        "print(\"Project Name: \", second_project.name, \"Project ID: \", second_project.uid)\n",
        "\n",
        "# Then, use the method that will create multiple batches if necessary.\n",
        "task = second_project.create_batches(\n",
        "  name_prefix=\"demo-create-batches-\",\n",
        "  global_keys=global_keys,\n",
        "  priority=5\n",
        ")\n",
        "\n",
        "print(\"Errors: \", task.errors())\n",
        "print(\"Result: \", task.result())"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Create batches from a dataset\n",
        "\n",
        "If you wish to create batches in a project using all the data rows of a dataset, instead of having to gather global keys or ID and using subsets of data rows, you can use the `project.create_batches_from_dataset()` method. This method takes in a dataset ID and creates a batch (or batches if there are more than 100k data rows) comprised of all data rows not already in the project.\n",
        "\n",
        "The same logic applies to the `name_prefix` argument and the naming of batches as described in the section immediately above."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# First, we must create a third project so that we can re-use the data rows we already created.\n",
        "third_project = client.create_project(\n",
        "  name=\"Third-Demo-Batches-Project\",                                 \n",
        "  media_type=lb.MediaType.Image\n",
        ")\n",
        "print(\"Project Name: \", third_project.name, \"Project ID: \", third_project.uid)\n",
        "\n",
        "# Then, use the method to create batches from a dataset.\n",
        "task = third_project.create_batches_from_dataset(\n",
        "    name_prefix=\"demo-batches-from-dataset-\",\n",
        "    dataset_id=dataset.uid,\n",
        "    priority=5\n",
        ")\n",
        "\n",
        "print(\"Errors: \", task.errors())\n",
        "print(\"Result: \", task.result())"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "## Manage Batches\n",
        "Note: You can view your batch data through the **Data Rows** tab."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "### Export Batches"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "Batches will need to be exported from your project as a export parameter. Before you can export from a project you will need an ontology attached."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "#### Create and Attach Ontology to Project"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "classification_features = [\n",
        "    lb.Classification(\n",
        "        class_type=lb.Classification.Type.CHECKLIST,\n",
        "        name=\"Quality Issues\",\n",
        "        options=[\n",
        "            lb.Option(value=\"blurry\", label=\"Blurry\"),\n",
        "            lb.Option(value=\"distorted\", label=\"Distorted\")\n",
        "        ]\n",
        "    )\n",
        "]\n",
        "\n",
        "ontology_builder = lb.OntologyBuilder(\n",
        "    tools=[],\n",
        "    classifications=classification_features\n",
        ")\n",
        "\n",
        "ontology = client.create_ontology(\n",
        "  \"Ontology from new features\",\n",
        "  ontology_builder.asdict(),\n",
        "  media_type=lb.MediaType.Image\n",
        ")\n",
        "\n",
        "project.setup_editor(ontology)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "#### Export from Project"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "client.enable_experimental = True\n",
        "\n",
        "export_params = {\n",
        " \"attachments\": True,\n",
        "  \"metadata_fields\": True,\n",
        "  \"data_row_details\": True,\n",
        "  \"project_details\": True,\n",
        "  \"performance_details\": True,\n",
        "  \"batch_ids\" : [batch.uid] # Include batch ids if you only want to export specific batches, otherwise,\n",
        "  #you can export all the data without using this parameter\n",
        "}\n",
        "filters = {}\n",
        "\n",
        "# A task is returned, this provides additional information about the status of your task, such as\n",
        "# any errors encountered\n",
        "export_task = project.export(params=export_params, filters=filters)\n",
        "export_task.wait_till_done()"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "data_rows = []\n",
        "\n",
        "def json_stream_handler(output: lb.JsonConverterOutput):\n",
        "  data_row = json.loads(output.json_str)\n",
        "  data_rows.append(data_row)\n",
        "\n",
        "\n",
        "if export_task.has_errors():\n",
        "  export_task.get_stream(\n",
        "  \n",
        "  converter=lb.JsonConverter(),\n",
        "  stream_type=lb.StreamType.ERRORS\n",
        "  ).start(stream_handler=lambda error: print(error))\n",
        "\n",
        "if export_task.has_result():\n",
        "  export_json = export_task.get_stream(\n",
        "    converter=lb.JsonConverter(),\n",
        "    stream_type=lb.StreamType.RESULT\n",
        "  ).start(stream_handler=json_stream_handler)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "## Export the data row iDs\n",
        "data_rows = [dr for dr in data_rows]\n",
        "print(\"Data rows in batch: \", data_rows)\n",
        "\n",
        "## List the batches in your project\n",
        "for batch in project.batches():\n",
        "    print(\"Batch name: \", batch.name , \"  Batch ID:\", batch.uid)\n"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Archive a batch"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Archiving a batch removes all queued data rows in the batch from the project\n",
        "batch.remove_queued_data_rows()"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "## Clean up \n",
        "Uncomment and run the cell below to optionally delete the batch, dataset, and/or project created in this demo."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Delete Batch\n",
        "#batch.delete()\n",
        "\n",
        "# Delete Project\n",
        "#project.delete()\n",
        "\n",
        "# Delete DataSet\n",
        "#dataset.delete()"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    }
  ]
}